Impact of Data Balancing and Feature Selection on Machine Learning-based Network Intrusion Detection

نویسندگان

چکیده

Unbalanced datasets are a common problem in supervised machine learning. It leads to deeper understanding of the majority classes Therefore, learning model is more effective at recognizing than minority classes. Naturally, imbalanced data, such as disease data and networking, has emerged real life. DDOS one network intrusions found happen often R2L. There an imbalance composition attacks Intrusion Detection System (IDS) public NSL-KDD UNSW-NB15. Besides, researchers propose many techniques transform it into balanced by duplicating class producing synthetic data. Synthetic Minority Oversampling Technique (SMOTE) Adaptive (ADASYN) algorithms duplicate construct for Meanwhile, can capture labeled data's pattern considering input features. Unfortunately, not all features have equal impact on output (predicted or value). Some interrelated misleading. important should be selected produce good model. In this research, we implement recursive feature elimination (RFE) technique select from available dataset. According experiment, SMOTE provides better dataset ADASYN UNSW-B15 with high level imbalance. RFE selection slightly reduces model's accuracy but improves training speed. Then, Decision Tree classifier consistently achieves recognition rate Random Forest KNN.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An empirical evaluation for the intrusion detection features based on machine learning and feature selection methods

Despite the great developments in information technology, particularly the Internet, computer networks, global information exchange, and its positive impact in all areas of daily life, it has also contributed to the development of penetration and intrusion which forms a high risk to the security of information organizations, government agencies, and causes large economic losses. There are many ...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

Euclidean-based Feature Selection for Network Intrusion Detection

Nowadays, data mining has been playing an important role in the various disciplines of sciences and technologies. For computer security, data mining are introduced for helping intrusion detection System (IDS) to detect intruders correctly. However, one of the essential procedures of data mining is feature selection, which is the technique (commonly used in machine learning) for selecting a subs...

متن کامل

Linear Correlation-Based Feature Selection For Network Intrusion Detection Model

Feature selection is a preprocessing phase to machine learning, which leads to increase the classification accuracy and reduce its complexity. However, the increase of data dimensionality poses a challenge to many existing feature selection methods. This paper formulates and validates a method for selecting optimal feature subset based on the analysis of the Pearson correlation coefficients. We...

متن کامل

Machine Learning for Network Intrusion Detection

Cyber security is an important and growing area of data mining and machine learning applications. We address the problem of distinguishing benign network traffic from malicious network-based attacks. Given a labeled dataset of some 5M network connection traces, we have implemented both supervised (Decision Trees, Random Forests) and unsupervised (Local Outlier Factor) learning algorithms to sol...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: JOIV : International Journal on Informatics Visualization

سال: 2023

ISSN: ['2549-9610', '2549-9904']

DOI: https://doi.org/10.30630/joiv.7.1.1041